AITopics | compositional de-attention network

Compositional De-Attention Networks

Neural Information Processing SystemsDec-25-2025, 02:00:39 GMT

Attentional models are distinctly characterized by their ability to learn relative importance, i.e., assigning a different weight to input values. This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to \textit{add}, \textit{subtract} or \textit{nullify} a certain vector when learning representations. This is strongly contrasted with vanilla attention, which simply re-weights input tokens. Our proposed \textit{Compositional De-Attention} (CoDA) is fundamentally built upon the intuition of both similarity and dissimilarity (negative affinity) when computing affinity scores, benefiting from a greater extent of expressiveness. We evaluate CoDA on six NLP tasks, i.e. open domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis and text2code generation. We obtain promising experimental results, achieving state-of-the-art performance on several tasks/datasets.

compositional de-attention network, name change, textit, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Reviews: Compositional De-Attention Networks

Neural Information Processing SystemsJan-21-2025, 23:22:57 GMT

UPDATE after reading author rebuttal: I am looking forward to the more comprehensive evaluation that you are carrying out. Regarding Q3, please include details of the setup in the main paper. Also, more analysis needed regarding why zeroes are predominant in M in the main paper (also a point raised by R3) - rather than speculation or hypothesis. Overall, my opinion of the paper does not change and feel it is a good direction of research. This paper proposes an alternative to softmax-based attention mechanism - a quasi-attention technique: A dual affinity matrix approach is proposed compared to the usual single affinity matrix. One affinity matrix is created from the pairwise similarity computation.

affinity matrix, compositional de-attention network, visualization, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Compositional De-Attention Networks

Neural Information Processing SystemsOct-9-2024, 14:27:11 GMT

Attentional models are distinctly characterized by their ability to learn relative importance, i.e., assigning a different weight to input values. This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to \textit{add}, \textit{subtract} or \textit{nullify} a certain vector when learning representations. This is strongly contrasted with vanilla attention, which simply re-weights input tokens. Our proposed \textit{Compositional De-Attention} (CoDA) is fundamentally built upon the intuition of both similarity and dissimilarity (negative affinity) when computing affinity scores, benefiting from a greater extent of expressiveness. We evaluate CoDA on six NLP tasks, i.e. open domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis and text2code generation.

coda, compositional de-attention network, textit

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.30)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.30)

Add feedback

Compositional De-Attention Networks

Tay, Yi, Luu, Anh Tuan, Zhang, Aston, Wang, Shuohang, Hui, Siu Cheung

Neural Information Processing SystemsMar-18-2020, 23:01:51 GMT

Attentional models are distinctly characterized by their ability to learn relative importance, i.e., assigning a different weight to input values. This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to \textit{add}, \textit{subtract} or \textit{nullify} a certain vector when learning representations. This is strongly contrasted with vanilla attention, which simply re-weights input tokens. Our proposed \textit{Compositional De-Attention} (CoDA) is fundamentally built upon the intuition of both similarity and dissimilarity (negative affinity) when computing affinity scores, benefiting from a greater extent of expressiveness. We evaluate CoDA on six NLP tasks, i.e. open domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis and text2code generation.

coda, compositional de-attention network, textit

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.30)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.30)

Add feedback

Filters

Collaborating Authors

compositional de-attention network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Compositional De-Attention Networks

Reviews: Compositional De-Attention Networks

Compositional De-Attention Networks

Compositional De-Attention Networks